Data-Efficient Policy Search using PILCO and Directed-Exploration
نویسندگان
چکیده
Reinforcement learning (RL) algorithms solve general sequential decision making problems through learning by trial and error. Many reinforcement learning algorithms are proven to find a good or optimal controller, but may take many interactions with the environment to do so. For real world tasks, this is often impractical, as letting a learner interact with the environment takes time and can be costly.
منابع مشابه
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, pilco can cope with very little data and facilitates learning from scratch ...
متن کاملData-Efficient Reinforcement Learning in Continuous-State POMDPs
We present a data-efficient reinforcement learning algorithm resistant to observation noise. Our method extends the highly data-efficient PILCO algorithm (Deisenroth & Rasmussen, 2011) into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation. PILCO conducts policy search, evaluating each policy by first predicting an analytic distr...
متن کاملProbabilistic Inference for Fast Learning in Control
How can we learn control tasks as fast as possible given knowledge from experience only? •autonomous learning in control from scratch using experience only (no demonstrations) •no task-specific prior assumptions • learn fast (data efficient) model-based RL •deal with model bias during long-term planning: only small data sets available for learning dynamics models 1 Key Idea and Algorithm • lear...
متن کاملData-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs
We present a data-efficient reinforcement learning method for continuous stateaction systems under significant observation noise. Data-efficient solutions under small noise exist, such as PILCO which learns the cartpole swing-up task in 30s. PILCO evaluates policies by planning state-trajectories using a dynamics model. However, PILCO applies policies to the observed state, therefore planning i...
متن کاملSafe Policy Search with Gaussian Process Models
We propose a method to optimise the parameters of a policy which will be used to safely perform a given task in a data-efficient manner. We train a Gaussian process model to capture the system dynamics, based on the PILCO framework. Our model has useful analytic properties, which allow closed form computation of error gradients and estimating the probability of violating given state space const...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016